A Semantic Framework for Social Search
ثبت نشده
چکیده
Facebook = {facebook, social, network, websit, launch, februari, 2004, oper, privat, own, facebook, juli, 2010, facebook, 500, million, activ, user, person, fourteen, world, user, creat, person, pro l, add, user, friend, exchang, messag, includ, automat, notif, updat, pro l, addition, user, join, common, interest, user, group, organ, workplace, school, colleg, characterist, servic, stem, colloqui, book, student, start, academ, year, univers, administr, intent, help, student, facebook, declar, 13, year, regist, user, websit, facebook, found, mark, zuckerberg, colleg, roommat, fellow, comput, sscienc, student, eduardo, saverin, dustin, moskovitz, chri, hugh, websit, membership, initi, limit, founder, harvard, student, expand, colleg, boston, area, ivi, leagu, stanford, univers, gradual, ad, suport, student, univers, open, school, student, nal, ag, 13, facebook, met, controversi, block, intermitt, countri, includ, pakistan, syria, peopl, republ, china, vietnam, iran, north, korea, ban, place, work, discourag, employe, Tokenization is the rst step in preprocessing on Information Retrieval. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. Stop words are terms that appear so frequently in text that they lose their usefulness to be indexed as search terms. Porter Stemming algorithm o cial website http://tartarus.org/~martin/PorterStemmer visited June 2011 te l-0 07 08 78 1, v er si on 1 15 J un 2 01 2 4.3. Framework of the Social Search Engine 107 wast, time, servic, facebook, privaci, issu, safeti, user, comprom, time, facebook, settl, lawsuit, claim, sourc, code, intellectu, properti, site, involv, controversi, sale, fan, friend, januari, 2009, comper, studi, rank, facebook, social, network, worldwid, monthli, activ, user, myspac, entertain, weekli, put, end, deca, list, earth, stalk, ex, rememb, worker, birthdai, bug, friend, plai, rous, game, scrabul, facebook} In order to compute the similarity between context and candidate concept, we de ne the following datasets in order to transform our datas into vectors: • Concepts(k ∈ K) = {ci : ci ∈ C }. The set of candidate concepts for a keyword k. • History(u ∈ U) = {chistoric : chistoric ∈ C }. The set of concepts that user used before. • Abstract(c ∈ C) = {wj : wj ∈ W }. The set of words represent a concept ci (e.g. example of abstract of Facebook above) • Voc(k ∈ K) = ∪ Abstract(ci) : ci ∈ Concepts(k). The set of words of all the concepts corresponding to the keyword k. Voc stands for vocabulary. In addition, we also need to de ne the user context. The user context is every interaction that the user had before the moment he/she make the actual interaction (i.e. post a message, reply a comment, etc.). The old interactions were at a moment like the actual interaction so they were also transformed in concepts and saved in our system. The context is therefore consisting of normalized terms as the vocabulary (i.e. V oc(k)) and described as follows: • Context(u ∈ U) = ∪ Abstract(chistoric) : chistoric ∈ Historic(u). The set of terms of all the candidate concepts corresponding to the keyword k. (Voc stands for vocabulary). Now we can de ne for a user, his/her context and the concepts associated to a keyword using vectors. Those vectors are in <|V oc(k)| and each element holding a position in the vector corresponds to a term in V oc(k). • The context vector: Vcontext = (vi), where 1 ≤ i ≤ |V oc(k)| and vi = 1 if the corresponding term wi in Voc(k) appears in Context(u), otherwise vi = 0. • The concept vector: Vci = (vi), where 1 ≤ i ≤ |V oc(k)| and vi is the frequency of the corresponding term wi in Voc(k). By doing this, when we want to select a concept from the candidate list, we can create a Vcontext for the context of the user and then a Vconcept for each concept related te l-0 07 08 78 1, v er si on 1 15 J un 2 01 2 108 Chapter 4. A Semantic Framework For Social Search to the keyword. Then, we can compare Vcontext with each Vconcept using a cosine similarity measure. The cosine of the angle between two vectors is a value between 0 and 1. When the angle is small the cosine value tends to 1, when the angle is big the cosine value tends to 0. Smaller the angle is, more relevant the concept will be. The formula to compute the similarity is the standard cosine similarity: Sim(Vcontext, Vconcept) = cosθ = −−−−−→ Vcontext. −−−−−→ Vconcept | −−−−−→ Vcontext|.| −−−−−→ Vconcept| Concept selection algorithm We now know how to build a context vector and concepts vector. We also know how to compare between them to select the most relevant concepts. The Concept Selecting algorithm is described in Algorithm 2. The principal situations that may occur while executing the algorithm are the following: • user context does not exist if other keywords exist we consider these keywords as context, construct the context vector, and compare it with the concepts from DBpedia disambiguation property. We return to the question of having 2 lists of concepts. The list of concepts that comes from DBpedia disambiguation property is used only when we have no user's context but some other keywords in the same message. By computing the similarity we have a score for each concept (line 10). We continue to look at the scores: ∗ other keywords have no in uence it's when all the score is equal to 0 meaning that those other keywords in the message are useless, they have no relation neither with the keyword we are working on. In this case the most popular concept from DBpedia Lookup service will be returned as result (line 17). ∗ other keywords give scores in this case, we return the concepts with highest scores (line 19). no other keywords in this case the most popular concept from DBpedia Lookup service will be returned as result (line 22). • The user context exists we then add other keywords (if they also exist) to the context and then construct the context vector. For each concept from DBpedia Lookup, construct a concept vector and compute the similarity. Same as before, we compare the scores and return the concepts with the most highest scores (line 32). te l-0 07 08 78 1, v er si on 1 15 J un 2 01 2 4.3. Framework of the Social Search Engine 109 Algorithm 2 ConceptSelecting(user, conceptList, popularConceptList) 1: Required: conceptList 6= null & popularConceptList 6= null 2: voc[]← constructV ocabulary(conceptList) 3: context[]← getContext(user) 4: otherKeywords[]← getOtherKeywords() 5: if (isEmpty(context[])) then 6: if (!isEmpty(otherKeywords[])) then 7: contextV ector[]← constructContextV ector(otherKeywords[], voc[]) 8: maxScore← 0 9: for all (concept in conceptList[]) do 10: conceptV ector[]← constructConceptV ector(voc[]) 11: sim← getSimilarity(contextV ector[], conceptV ector[]) 12: conceptAndScoreList← (concept, sim) 13: if (sim ≥ maxScore) then 14: maxScore← sim 15: end if 16: end for 17: if (maxScore = 0) then 18: RETURN popularConceptList[0] 19: else 20: RETURN getBestConcept(conceptAndScoreList) 21: end if 22: else 23: RETURN popularConceptList[0] 24: end if 25: else 26: context[]← add(otherKeywords[]) 27: contextV ector[]← constructContextV ector(context[], voc[]) 28: for all (concept in popularConceptList[]) do 29: conceptV ector[]← constructConceptV ector(voc[]) 30: sim← getSimilarity(contextV ector[], conceptV ector[]) 31: conceptWinnerList← (concept, sim) 32: end for 33: RETURN getBestConcept(conceptAndScoreList) 34: end if te l-0 07 08 78 1, v er si on 1 15 J un 2 01 2 110 Chapter 4. A Semantic Framework For Social Search Figure 4.16: Illustration of the winner concepts. One thing to be noticed here is that we don't return only the concept with the highest score but n concepts with n highest scores. Therefore, one or several concepts will be returned based on the di erence of their score with the others. By default, we designed our algorithm to return concepts with scores higher than 80% of the highest score to be sure that we won't miss any concept. The reason behind this idea can be described in Figure 4.16. The gure in the left shows the usual case of similarity scores when there is not many context in the system for a user when processing his/her message. The scores are therefore always low and there is not a big di erence between those scores. We then select not only the ones with the highest scores (i.e. the concept c4) but the 2 concepts c3 and c4 whose scores are larger than 80% of the score of c4. We update the context with these new concepts. The user context then continues to grow in this way and becomes more and more stable and oriented to a few principal subjects/topic. Once the context is rich, the distinction between the similarity scores will appear and can be described as shows the right hand side of the Figure 4.16. In the following section, we present the second step of the pro le construction process. Once the user's interactions (i.e. update, post, message, etc.) are matched to DBpedia concepts, we expand these concepts to better approach user's interests. 4.3.4 Semantic Expansion in the Knowledge Base
منابع مشابه
Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملA framework for real-time semantic social media analysis
This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, tree...
متن کاملKeyword-based querying for XML and RDF
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike thi...
متن کاملSemantic Disambiguation and Contextualisation of Social Tags1
We present an algorithmic framework to accurately and efficiently identify the semantic meanings and contexts of social tags within a particular folksonomy. The framework is used for building contextualised tag-based user and item profiles. We also present its implementation in a system called cTag, with which we preliminary analyse semantic meanings and contexts of tags belonging to Delicious ...
متن کاملExploiting Social Property for Improving Distributed Semantic Search
To locate desirable Semantic Web data in a distributed network, the discovering mechanism has to be not only semantically rich, in order to cope with complex queries, but also scalable to handle large numbers of information sources. In this paper, we propose a novel scheme that exploits the social property of humans, such as natural grouping and peer recommendation between people with common in...
متن کاملHybrid Social Networking Application for a University Community
A hybrid social network for building social communities for a university community is presented. The system employed the semantic ontology for an offline/online social network site (SNS) using a Mobile Ad Hoc Network. It captures the core features of an SNS including profile creation, friend invite/search, group formation, chatting/messaging, blogging and voting. Three core frameworks – the pee...
متن کامل